Learning String Edit Distance

نویسندگان

Eric Sven Ristad

Peter N. Yianilos

چکیده

In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn the optimal string edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the di cult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string edit distance function with one third the error rate of the untrained Levenshtein distance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning String Edit Distance 1

متن کامل

MAUL: Machine Agent User Learning∗

We describe implementation of a classifier for User-Agent strings using Support Vector Machines. The best kernel is found to be the linear kernel, even when more complicated string based kernels, such as the edit distance kernel and the subsequence kernel, are employed. A robust tokenization scheme is employed which dramatically speeds up the calculation for the edit string and subsequence kern...

متن کامل

Learning Balls of Strings from Edit Corrections

When facing the question of learning languages in realistic settings, one has to tackle several problems that do not admit simple solutions. On the one hand, languages are usually defined by complex grammatical mechanisms for which the learning results are predominantly negative, as the few algorithms are not really able to cope with noise. On the other hand, the learning settings themselves re...

متن کامل

Learning state machine-based string edit kernels

During the past few years, several works have been done to derive string kernels from probability distributions. For instance, the Fisher kernel uses a generative model M (e.g. a hidden markov model) and compares two strings according to how they are generated by M . On the other hand, the marginalized kernels allow the computation of the joint similarity between two instances by summing condit...

متن کامل

Detecting English-French Cognates Using Orthographic Edit Distance

Identification of cognates is an important component of computer assisted second language learning systems. We present a simple rule-based system to recognize cognates in English text from the perspective of the French language. At the core of our system is a novel similarity measure, orthographic edit distance, which incorporates orthographic information into string edit distance to compute th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Trans. Pattern Anal. Mach. Intell.

دوره 20 شماره

صفحات -

تاریخ انتشار 1997

Learning String Edit Distance

نویسندگان

چکیده

منابع مشابه

Learning String Edit Distance 1

MAUL: Machine Agent User Learning∗

Learning Balls of Strings from Edit Corrections

Learning state machine-based string edit kernels

Detecting English-French Cognates Using Orthographic Edit Distance

عنوان ژورنال:

اشتراک گذاری